Implementation wise, this is super simple, but the design aspects of it have been bugging the shit out of me for a wide variety of reasons. So i want you guys to tell me an elegant, non-embarassing way to do this

*** Background ***
This is for my AI 2 - Advanced Expert Systems class. i'm writing a CBR system to help shoppers buy a PC. CBR = case based reasoning = nearest neighbor = calculating which "answer" is closest to what you're asking for. Ex. - you ask for a computer that has a 1GHz CPU, 128MB RAM and a 30GB HD. i look at all the systems we sell and tell you what we have that's closest (ie, here's a system that's a 92% match) Ex #2 - on a scale of 1 to 5, you want a computer with speed=5, storage=3 and cost=1 (in other words, i want something fast and cheap with an avg size HD)

For portability reasons, this is a Java (not JSP) app. For testing purposes, there is a debug interface that dumps all sorts of data. One of the tabs is a data breakdown screen. This screen shows all the different values for each property. It's a nifty little grid that looks kinda like:

	Cost			HD		Manufacturer
100	$3,500	1/10	100	70GB	2/10	Compaq	4/10
 90	$3,000	1/10	90	60GB	0/10	Dell		2/10
 80	$2,500	4/10	80	50GB	0/10	HP		4/10
 70	$2,000	2/10	70	40GB	4/10		
 ...etc...
  0	  $500	1/10	  0	5GB	1/10

OK, so we sell 10 different types of PCs from 3 manufacturers. Cost, HD and Manufacturer are properties of each PC. The most expensive PC costs $3,500. The cheapest is $500

The data breakdown shows 3 things. First is the percentile ranking. If i do 0-100 in increments of 10, that's 11 breakpoints, which just seems wrong for some reason, but oh well. Second column is the value at that breakpoint. 100=max, 0=min, the rest you calculate ((range/11)*percentile + min). Third column is how many values fall into that percentile (value = breakpoint+/- 5, so in this example systems costing $2,251 to $2,750 would be lumped together in the 80%/$2,500 breakpoint)

For non-numeric values (Bools like HasDVDPlayer and strings like Manufacturer), we blow off the percentiles and just list values and frequency

The point of this whole exercise is to see the spread and allocation of our values so that we can see if the predictions made by the system seem reasonable. Plus it explains a bit of the voodoo of CBR, which is good for an academic tool that's going to be graded


*** The Question ***
i know how to implement this using spaghetti code. i know how to do this by hardcoding in a bunch of assumptions. What i'm looking for is an _elegant_ solution

Right now, i have 3 classes: DataBreakdownForTraits, DataBreakdownForTrait and BreakPoint. A BreakPoint has the properties Percentile, Value and Count. A DataBreakdownForTrait has a collection of BreakPoints, all for a specific trait (cost or RAM or HD, etc.). Which means DataBreakdownForTraits is a collection of DataBreakdownForTrait objects (one per trait). Attached is a very crude class diagram i threw together in ArgoUML

(as a side note, the CBR engine i wrote already has the classes Items which has Item objects, Item has a collection of Traits and each Trait's name and datatype are described by a TraitDescriptor stored in the system's TraitDescriptors collection; this is all done because this is a data driven system where the properties of an Item are not known until run time)

Using this would be something like:
	dataBreakdownForTraits.parse( items )
	for each dataBreakdownForTrait in dataBreakdownForTraits
		for each breakPoint in dataBreakdownForTrait 
			showBreakPoint( breakPoint )
assuming Java actually supported perl's nice for..each syntax :)


*** Wrap Up ***
This whole thing will work but something about it seems off. First, i know many OO types complain about using summary objects like i do - i ought to be able to do all this, they'd say, just using the properties of the entity objects. And they're right, but the only way i can picture it ends up in massive OO spaghetti code. The summarization objects help me avoid 300-line methods (ick). Second, i probably don't need a BreakPoint object to hold just the value, frequency and, sometimes, percentile rank, and said object is pretty darn stupid (no methods, just property accessors; it's basically a simple data type), but it's a hell of a lot easier to implement than trying to shove each individual value in a hashmap or synchronized ArrayLists or some such. Third, the names aren't great, but i've hit a mental wall and this was the best i came up with

i got this feeling i'm missing something big and really, really simple and obvious that makes much of this problem go away. Have i completely botched this? Any pointers on making this prettier? Or am i actually on the right track?

Also, ArgoUML has an auto-critique feature which complains about your class diagrams. It's complaining about my naming conventions re: collections and the collected. It does not like Item and Items, Trait and Traits (critique: names are too similar, will be confusing for developer). That's the way i've always done collections and it always seemed to work for me. Anyone else think that's a problem? Alternatives?

Darrin, my UML syntax is pretty ugly, but i ain't not knowin' Argo too well yet and only put a few minutes efforts into this (notice public attributes on BreakPoint!), but it does have a single path navigation, composition/aggregation relationship, and you know how much i *despise* UML accessor adornments. Got any comments on the way the diagram looks?

